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ABSTRACT 


Brain computer interface (BCI) technology connects humans with machines 
via electroencephalography (EEG). The mechanism of BCI is pattern recognition, 
which proceeds by feature extraction and classification. Various feature 
extraction and classification methods can differentiate human motor 
movements, especially those of the hand. Combinations of these methods 
can greatly improve the accuracy of the results. This article explores 
the performances of nine feature-extraction types computed by a multilayer 
extreme learning machine (ML-ELM). The proposed method was tested on 
different numbers of EEG channels and different ML-ELM structures. 
Moreover, the performance of ML-ELM was compared with those of ELM, 
Support Vector Machine and Naive Bayes in classifying real and imaginary 
hand movements in offline mode. The ML-ELM with discrete wavelet 
transform (DWT) as feature extraction outperformed the other classification 


Multilayer ELM methods with highest accuracy 0.98. So, the authors also found that 
the structures influenced the accuracy of ML-ELM for different task, 
feature extraction used and channel used. 
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1. INTRODUCTION 

Technological advances, especially in Brain—Computer Interfaces (BCIs), play a crucial role in 
assisting patients with physical disabilities or paralysis [1]. BCI detects the brain activity in the cortex as 
electroencephalography (EEG) signals. One of the most popular BCI applications is classifying human motor 
movements, especially hand movements, for various assistive robots [2, 3]. Hand movements can be 
recognized and classified by various recognition/classification methods [4, 5] that distinguish between right- 
and left-hand movements in real-time [6, 7] or in imagery [8, 9]. The pattern recognition method 
must be optimized before implementing the EEG signal on an assistive robot that controls the robot 
hand/robot therapy. 

In EEG signal processing, the signal information is first extracted and then classified [9]. Among the most 
popular classification methods are the support vector machine (SVM) [10, 11], linear discriminant analysis [12, 13], 
artificial neural networks [14, 15], and fuzzy algorithms [16, 17]. The accuracy obtained in each 
classification method also varies in each study, like the results obtained by Farooq et al. [18]. In this research, 
the classification method used the principal component analysis (PCA) and independent component analysis 
(ICA) as feature extraction with the K nearest neighbor (KNN) as a classifier which obtained an accuracy 
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rate of 80.6%. Besides that, Kevric et al. [19] also use KNN as its classifier with discrete wavelet transform 
(DWT) as feature extraction. From this study, the accuracy of the system reaches 92.8%. Deep learning was 
also used in this case, one of which was a study conducted by Korhan et al. [20] who tried to solve problems 
using common spatial filter (CSP) and convolutional neural network (CNN). In this study, the system reaches 
an accuracy of 93.75%. However, the existing classification methods often struggle to meet the following 
requirements: (1) ease of use, (2) low computational complexity, and (3) high accuracy [1, 21]. 
One classification methods that can achieve these criteria is the extreme learning machine (ELM) [22]. 

The ELM was applied to EEG classification by Tan et al. [23]. They reported that ELM delivers 
excellent performance compared to other classification methods. However, other studies have reported poor 
performance stability in ELM. The poor performance is caused by the randomness of assigning weights (w) 
to the neurons and the threshold between the input layer and the hidden layer [24]. These problems degrade 
the accuracy of ELM classification. However, the problems found in ELM have been resolved by 
the existence of a multilayer extreme learning machine (ML-ELM). With its multi-layered structure, 
ML-ELM overcomes the problems found in ELM. In particular, the data are transformed in each layer, 
accelerating the convergence and minimizing the errors [24]. Research conducted by Duan et al. [25] has 
proven the superiority of this ML-ELM. In that study, researchers used the PCA and linear discriminant 
analysis (LDA) as feature extraction with the ML-ELM as the classifier. The results obtained indicate that the 
accuracy rate of this system is 94.2%. But unfortunately, Duan et al. [25] only explored the use of 
ML-ELM in the extraction feature, even though many feature extractions can be used to improve 
the accuracy of the classification of EEG signals such as power spectral density (PSD) and also DWT. 
Therefore, to this end, the authors explore various feature extraction methods employed in previous studies, 
and integrate them into a multilayer (ML) ELM classifier. 

In this exploration study, the performance of hand classification will be improved by using several 
step. First, the ML-ELM was paired with nine feature extractions that are popularly used in EEG. 
After selecting the best feature extraction—classifier pairs, we tracked the performances of the optimized 
ML-ELM classifier with different numbers of channels and different structures. To demonstrate the superior 
performance of ML-ELM, the classifier was competed against the ELM, SVM and Naive Bayes (NB) 
classifiers. Each of the three classifiers performed nine types of feature extractions. The nine-feature 
extraction was selected because in our previous research, the features used were limited to power 
features [23]. Feature extraction in EEG has many variations [26] which can be combined with ML-ELM. 
This opportunity must be exploited because different feature extraction and classification methods can pool 
their specific advantages to improve the accuracy of the result. The accuracy performances of 
the classification methods were evaluated by the K-fold technique. The ML-ELM outperformed the other 
classifiers. When paired with discrete wavelet density, ML-ELM delivered its highest performance, with the 
highest classification accuracy of 0.98 (98%). 


2. RESEARCH METHOD 
2.1. Multilayer extreme learning machine 

As explained in the previous section, ML-ELM is ELM with more than one hidden layer [27]. The last 
hidden layer generally processes the ELM algorithm. To qualify as ML-ELM, some articles require the ELM 
to possess more than two hidden layers [24]. ML-ELM differs from ELM in fundamental ways. ML-ELM is 
based on the ELM auto-encoder [28], and is algorithmically similar to deep learning. The ELM auto-encoder 
operates on the weights (w) of the hidden layers during training with unsupervised learning. However, unlike 
deep learning, there is no fine-tuning in ML-ELM. The activation function in ML-ELM can be linear or 
nonlinear. If the number of nodes L, in the k-th hidden layer equals the number of nodes L,.; in the (k-/)-th 
hidden layer, g is a linear function; otherwise, g is a nonlinear connection such as a sigmoidal function. 
The output matrix H“ of hidden layer k is given by: 


H* = g (BPH >) (1) 


The input layer x can be considered as the 0-th hidden layer (i.e., k = 0). The output of the connection between 
the last hidden layer and the output node fis analytically calculated by the regularized least-squares method. 


2.2. Dataset and system construction 

The data in this research were provided by Hohyun Cho from the School of Electrical Engineering 
and Computer Science, Gwangju Institute of Science and Technology, South Korea [29]. The dataset consists 
of 52 data with two types of hand movements: real right- and left-hand movements, and imaginary right- and 
left-hand movements. The dataset contains 64 channels of EEG data. In this study, the EEG signals in this 
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dataset were extracted by several signal processing methods. The methods were evaluated on nine popular 
feature extraction types used in existing studies as shown in Table 1. 


Table 1. The nine feature-extraction types employed in the present study 


No Feature Extraction Equation Information 
1 Bandpower [30] S idxDelta is the power at a specific 
deltaBP = w(idxDelta) frequency (bandpass). 
n=1 

k č kK _1 

F = e es 

res = 7 Rit 
2 Hurst Coeffecient [31] Hc = log(R/S)/log(T) T is the sampling period, R/S is 

the rescaled range value. 
3 Detrended Fluctuation JaN X(k) is the EEG in the time series 
Analysis (DFA) [32] F(n) = => (x(k) — X,,(k))? data, X,,(k) is the local trend. 
k=1 

4 Hjorth [31] Activity = 0% Og is the average signal power or 


Mobility = y (02/01)? — (01/00)? 
Complexity = 0, / do 


variance. 0, and 0> are variants of 
the first and second signal 


derivatives, respectively. 

{xd(n)} is a sequence with d = 
1,2,3...L signal intervals, each of 
length M. U © stands for 
the normalization factor of power 
in the window function. w(n) 
denotes the windowed data. 


5 Power Spectral Density i 


1 M-1 
(PSD) [30] pd (f) = MU D xa (n)w(nje/2"F 


1 M-1 5 
U = — w(n 
el (n)| 


PWelch(f) = -) PaA) 





6 Mean Absolute Value g x is a signal in a large sample of 
(MAV) [33] MAV = N > ed size N, i is the sequence index. 
i=1 
7 Standard Deviation [33] _ x is the average of x. 
Srp = eee 2? . 
N-1 


K is the number of samples in the 
signal. Nd denotes the number of 
signal changes in the signal 
derivative. 


8 Petrosian Fractal 


k 
Dimension (PFD) [31] PFD = logyok / (ia + 10910 [a +0 ma) 


9 Discrete Wavelet 
Transform (DWT) [34] 


x[n] is a discrete-time signal, 
g|.| is first passed through a half- 

band high-pass filter and then 
a1|k] = ylow|k] = x[n].h[2k — n] through A[. | in the low-pass filter. 
1 dl and al are level-l 
approximation coefficients. 
yhigh|k] and ylow|k] are the 
outputs of the high-pass and low- 
pass filters, respectively, after 
subsampling. 


d1[k] = yhigh[k] = >. x[n]..g[2k — 7] 


n 


Prior to extraction, the EEG signal was preprocessed through a bandpass filter operated at 8-14 Hz 
with a sampling frequency of 512 Hz. For the present study, the authors chose the data on subjects 3, 5, 9, 14, 
15, 35, 41, 43, 46, and 52 from the provided dataset. The preprocessed and extracted EEG signals were then 
classified by various types of extraction methods applied in the literature. The classification methods 
are required to discriminate between right- and left-hand movements. This study compared the results of 
three classification methods: ELM, ML-ELM, SVM and NB. Figure | is a flowchart of the work process. 
The proposed system is implemented in Python 3.6.5. 
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Figure 1. Methods adopted in the present research 


2.3. Testing methods and parameters 

The method was tested for different numbers of channels and different structures of the ML-ELM. 
The testing methods are detailed by: 
Testing Method 1: The dataset was classified by pairs of classification methods. Altogether, there were 36 
pairs of classification methods (nine feature extractions performed by four classifiers). Each pair was 
required to classify the five selected dataset subjects on 3, 5, 9, 15, 41, or 64 channels. The details of 
the channels were as follows: 3 channels: C3, Cz, C4; 5 channels: C3, C1, Cz, C2, C4; 9 channels: C3, C2, 
CZ, C2, C4, FZ, FCz, CP2, Pz; 15 channels: FC3, FC1, FCz, FC2, FC4, C3, C1, Cz, C2, C4, CP3, CP1, CPZ, 
CP2, CP4; 41 channels: AF3,AFZ,AF4,F5,FC3,FC1, FCz, FC2, FC4, FC6, C5, C3, C1, Cz, C2, C4, C5, CPS, 
CP3, CP1, CPZ, CP2, CP4, CP6, P5, P3, Pl, Pz, P2, P4, P6, PO3, POz, PO4; and 64 channels: all channels in 
the dataset. The number of neurons was 1000 in the ELM classifier and 1000 in each layer (five layers) of 
the ML-ELM classifier. 
Testing Method 2: Using the two feature extractions yielding the highest accuracy and the two number of 
the optimal channels, the system was tested for different structures of the ML-ELM. From an initial sequence 
of two layers, the ML-ELM was built to a maximum of eight hidden layers with a maximum neuron number 
of 100 per layer. Before adding a new hidden layer, each neuron in the present layer was increased by ten 
neurons. All tests were performed on the ten subjects selected from the dataset. The results in the following 
section report the average accuracy over all subjects and all channel used, as obtained in the 5-fold evaluation. 


3. RESULTS AND DISCUSSION 
This section reports and discusses the results under the test scenarios described in the previous section. 


3.1. Testing method 1 

The results of the ML-ELM, ELM, SVM and NB methods are shown in Figures 2, 3, and 4, 
respectively. Figures 2 and 3 are the performance data for each pair of classification methods. The results 
contained in the Figures are the result of the accumulation of all subjects used and the entire channel used. 
The accumulated results are then averaged based on feature extraction and classifier. After that, the results 
are displayed in a graph contained in Figures 2 and 3. The results obtained indicate that PSD and DWT have 
higher performance compared to other extraction features. This claim is found in the ELM and ML-ELM 
classifier types. In the Movement type, PSD has the best accuracy on the combination with ML-ELM 0.92. 
As for DWT, the performance is owned by ELM with an accuracy of 0.95. 

For imagery types, ML-ELM has the best accuracy on PSD with an accuracy of 0.93. In DWT, 
ML-ELM also has the best performance with a value of 0.96, while ELM has an accuracy of 0.958. Related 
to the performance of each classifier, then analyzing of head-to-head comparison data is needed for each task. 
It can be known by adding up the overall accuracy contained in the nine extraction features used. The best 
performance data can be seen in Figure 4. The results show that ML-ELM has the best performance 
compared to the other three classifiers. The total accuracy obtained by ML-ELM on the nine feature 
extractions are 5.47 for movement task and 5.28 for imagery task. Even so, ELM has almost the same 
performance as ML-ELM, with a value of 5.40 for imagery and 5.25 for movement. 
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Figure 4. Total accuracy of classifier performace in each task 


3.2. Testing method 2 

This test was performed for different structures of ML-ELM. The test was intended to determine 
the effect of changing the number of hidden layers and neurons on the accuracy results. The results 
are presented in Figures 5 and 6. Labels such as "PSD-I-15" describe the feature extraction type, type of 
movement (“I” or “M’’)-number of channels (15 or 41). The numbers along the x-axis are the combined 
numbers of hidden layers and neurons. The number of tens symbolizes the number of hidden layers and 
the unit number symbolizes the number of neurons. For example, 23 denotes three hidden layers with 1000 
neurons in that layer and 30 neurons in the fourth layer. The selected numbers of channels (15 and 41) 
provided the highest accuracy in the previous test. The results of the present test are the average accuracies 
over the ten selected subjects. 


DWT Performance PSD Performance 





Structure Modification Structure Modification 
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Figure 5. DWT performance in structure change of | Figure 6. PSD performance in structure change of 
ML-ELM ML-ELM 
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From the results obtained indicate that the ML-ELM structure has a significant effect on the level of 
accuracy obtained. This proves that there is a need for tuning the ML-ELM structure in different types of 
tasks, feature extraction and the number of channels to get the best performance. For example in the task 
movement data with DWT extraction feature using 41 channels, it is found that the average of the accuracy 
of 10 subjects, the best performance is located in the 7th ML-ELM structure or with the structure [1000, 700] 
which means 1000 is the first number of neurons and 700 for neurons second. Whereas in other structures, it 
has fluctuating accuracy. The lowest value obtained is in the 49th structure (1000 neurons for layers 1 to 5 
and 900 neurons for layer 6) with an accuracy of 0.9. Whereas the type of task imagery with DWT extraction 
feature uses 41 channels, the best structure is in structure number 17 (1000 neurons for layers 1 to 2 and 700 
neurons in layer 3) with an accuracy of 0.984. When viewed in the use of a different number of channels, 
the best structure that is produced is also different. For example, on the use of channel number 15, DWT on 
imagery task has the best performance on structure number 6, and on the movement task has the best 
performance on structure number 19. This is also experienced on PSD feature extraction both on using 
the different number of channels and various tasks. However, when compared, DWT still has the best 
performance compared to PSD. When examined, the phenomenon that occurs in the form of fluctuating 
accuracy values is caused by underfitting and overfitting problems. This can be observed from Figure 5 
and Figure 6. If with the addition of structure, the accuracy rate is getting better, then the system 
is underfitting. Conversely, if the addition of structure precisely decreases the accuracy level, it means the system 
is overfitting. 


4. CONCLUSION 

The results confirmed the superior classification performance of multilayer extreme learning 
machine (ML-ELM) over ELM, Support Vector Machine and Naive Bayes on almost every feature- 
extraction evaluation. Moreover, ML-ELM delivered its best results with DWT as feature extraction. 
The accuracy of ML-ELM depended on the number of channels used, the feature extraction, and the task. 
However, the accuracy was sometimes degraded by underfitting and overfitting problems, causing accuracy 
fluctuations in the results of different structure change. In this article, the combinations of feature extractions 
and classification methods were somewhat limited. In future research, the underfitting and overfitting 
problem will be resolved by combining ML-ELM with optimization algorithms. The second line of research 
is related to combinations of feature extractions. Instead of a single feature extraction, ML-ELM can be 
combined with several feature extractions. This is expected to improve the accuracy by increasing the ease of 
differentiating the data. Finally, one could apply additional filters. 
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